Show the code
import pandas as pd
import polars as pl
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)Through history names at one point or another were considered new or novel. Overtime certain names came into a pool of commonly used names that parents would then pull from rather than coming up with a new one.
With the rise of Christianity, certain trends in naming practices manifested. Christians were encouraged to name their children after saints and martyrs of the church. These early Christian names can be found in many cultures today, in various forms. These were spread by early missionaries throughout the Mediterranean basin and Europe.
By the Middle Ages, the Christian influence on naming practices was pervasive. Each culture had its pool of names, which were a combination of native names and early Christian names that had been in the language long enough to be considered native. Now the question isn’t just what are native names but what are the greatest influences of parents naming their children? ref
The data is stored in a CSV file that gives the number of times a specific name was given to a child in a specific year.
import pandas as pd
import polars as pl
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)# Data Dictionary: https://github.com/byuidatascience/data4names/blob/master/data.md
# df_url = pd.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv")
df = pd.read_csv("names_year.csv")
# df_url_pl = pl.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv", infer_schema_length=100000)
df_pl = pl.read_csv("names_year.csv", infer_schema_length=100000)# Pandas
print(f"Creating the graph using a Pandas DataFrame\n")
df_ell = df.query('name == ["Elliot"]')
ell = (
ggplot(data=df_ell)
+ geom_line(mapping = aes(x = 'year', y = 'Total', color='name'))
+ scale_color_manual(values={'Elliot': 'blue'})
+ scale_x_continuous(limits=[1950,2025], format = 'd', expand=[0,0])
+ scale_y_continuous(format='d')
+ labs(
title="Elliot... What?"
)
+ theme(panel_grid=element_line(linetype=1, color = "white"),
panel_background=element_rect(fill="#e5ecf6", linetype=0),
axis_line_x=element_blank(),
axis_ticks_x=element_blank()
)
+ geom_text(x = 1982, y = 1300, label="E.T. Released", hjust="right", size=6)
+ geom_text(x = 1985, y = 1300, label="Second Released", hjust="left", size=6)
+ geom_text(x = 2002, y = 1300, label="Third Released", hjust="left", size=6)
+ geom_vline(xintercept=1982, color="red", linetype="dashed")
+ geom_vline(xintercept=1985, color="red", linetype="dashed")
+ geom_vline(xintercept=2002, color="red", linetype="dashed")
)
display(ell)
ggsave(ell, filename="elliot_pandas.svg", path="./plots/")
# Polars
print(f"\nUsing a Polars DataFrame")
df_ell_pl = df_pl.filter(pl.col('name').is_in(["Elliot"]))
ell_pl = (
ggplot(data=df_ell_pl)
+ geom_line(mapping = aes(x = 'year', y = 'Total', color='name'))
+ scale_color_manual(values={'Elliot': 'blue'})
+ scale_x_continuous(limits=[1950,2025], format = 'd', expand=[0, 0])
+ scale_y_continuous(format='d')
+ labs(
title="Elliot... What?"
)
+ theme(panel_grid=element_line(linetype=1, color = "white"),
panel_background=element_rect(fill="#e5ecf6", linetype=0),
axis_line_x=element_blank(),
axis_ticks_x=element_blank()
)
+ geom_text(x = 1982, y = 1300, label="E.T. Released", hjust="right", size=6)
+ geom_text(x = 1985, y = 1300, label="Second Released", hjust="left", size=6)
+ geom_text(x = 2002, y = 1300, label="Third Released", hjust="left", size=6)
+ geom_vline(xintercept=1982, color="red", linetype="dashed")
+ geom_vline(xintercept=1985, color="red", linetype="dashed")
+ geom_vline(xintercept=2002, color="red", linetype="dashed")
)
display(ell_pl)
# the sv is simply to suppress the output
sv = ggsave(ell_pl, filename="elliot_polars.svg", path="./plots/")Creating the graph using a Pandas DataFrame
Using a Polars DataFrame